Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
نویسندگان
چکیده
This paper describes a novel approach based on voice conversion (VC) to speaker-adaptive speech synthesis for speech-tospeech translation. Voice quality of translated speech in an output language is usually different from that of an input speaker of the translation system since a text-to-speech system is developed with another speaker’s voices in the output language. To render the input speaker’s voice quality in the translated speech, we propose a voice quality control method based on one-tomany eigenvoice conversion (EVC) and language-dependent prosodic conversion. Spectral parameters of the translated speech are effectively converted by one-to-many EVC enabling unsupervised speaker adaptation. Moreover, prosodic parameters are modified considering their global differences between the input and output languages. The effectiveness of the proposed method is confirmed by experimental evaluations on cross-lingual VC among Japanese, English, and Chinese.
منابع مشابه
Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics
This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Cross-lingual speech synthesis based on voice conversion or HMM-based speech synthesis, which synthesizes foreign language speech of a specific non-native speaker reflecting the speaker-dependent acoustic characteristics extracted from the speaker’s natural speech in his/h...
متن کاملEigenvoice-based Approach to Voice Conversion and Voice Quality Control
This paper reviews our proposed approach to voice conversion (VC) and voice quality control based on an eigenvoice technique. VC is a technique to modify nonlinguistic information such as speaker individuality while keeping linguistic information unchanged. In the traditional VC framework, a conversion model for a source and target speaker-pair needs to be trained in advance using a parallel da...
متن کاملHMM adaptation and voice conversion for the synthesis of child speech: a comparison
This study compares two different methodologies for producing data-driven synthesis of child speech from existing systems that have been trained on the speech of adults. On one hand, an existing statistical parametric synthesiser is transformed using model adaptation techniques, informed by linguistic and prosodic knowledge, to the speaker characteristics of a child speaker. This is compared wi...
متن کاملAdaptive Training for Voice Conversion Based on Eigenvoices
In this paper, we describe a novel model training method for one-to-many eigenvoice conversion (EVC). One-to-many EVC is a technique for converting a specific source speaker’s voice into an arbitrary target speaker’s voice. An eigenvoice Gaussian mixture model (EVGMM) is trained in advance using multiple parallel data sets consisting of utterance-pairs of the source speaker and many pre-stored ...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کامل